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Nonparametric  Estimation  of  Optimal 
Performance  Criteria  in  Quality  Engineering 


R.J.  Carroll  Peter  Hall 

Department  of  Statistics  Department  of  Statistics 
Texas  A&rM  University  Australian  National  University 
College  Station,  TX  77853  Canberra,  ACT  2601 
USA  Australia 

Abstract 

Box  (19S7)  and  Leon  et  al.  (1987)  discuss  the  problem  of  closeness  to  target  in  quality 
engineering.  If  the  mean  response  /(r,r)  depends  on  (x,r),  the  variance  function  is  a 
PERMIA  if  it  is  g(z),  i.e.,  depends  only  on  z.  The  goal  is  to  find  (x0, r0)  which  minimizes 

fr  \  "•  •  ; 

variance  while  achieving  a  target  mean  value.  We  pose  and  answer  the  question:  for  given 
smoothness  assumptions  about  /  and  g,  how  accurately  can  we  estimate  To  and  zo?  As 
part  of  the  investigation,  we  also  find  optimal  rates  of  convergence  for  estimating  /,  g  and 
their  derivatives. 

Keywords:  Nonparametric  Regression;  Performance  Measure!  PERMIA:  Quality  Control;' 
Taguchi’s  Method:  Variance  Function  Estimation.  '  - 
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1.  Introduction 

We  investigate  estimation  of  optimal  policies  in  what  Box  (19S7)  calls  the  problem 
of  “closeness  to  target”  in  quality  engineering;  see  also  Leon  et  al.  (19S7)  and  Taguchi 
Wu  (19S5).  System  variability  is  governed  by  a  control  factor  z,  so  that  observations  have 
variance  function  g(z).  System  mean  is  governed  not  only  by  the  control  factor  z  but  also 
by  a  signal  factor  x,  so  that  observations  have  mean  function  /(x,z).  In  the  terminology 
of  Leon  et  al.  (1987).  the  variance  function  g(z)  is  a  PERMIA.  As  in  Box  (1987).  the  goal 
is  to  find  the  control  setting  z0  which  minimizes  g,  and  to  find  the  signal  setting  x0  for 
which  /(x0,z0)  =  t0,  where  r0  is  a  prespecified  target  value. 

In  practice,  /  and  g  would  usually  be  unknown,  and  so  we  sample  a  variety  of  signal 
factors  and  control  factors  to  produce  estimators  /  and  g  of  f  and  g,  respectively.  Choose 
io  to  minimize  g,  and  given  io,  choose  io  so  that  /(io,io)  =  t0.  Interest  in  this  paper 
focuses  on  the  case  where  /  and  g  cannot  be  specified  parametrically.  We  pose  and  answer 
the  question:  for  given  smoothness  assumptions  about  /  and  g,  how  accurately  can  we 
estimate  x0  and  z0? 

Some  insight  into  the  problem  may  be  obtained  by  simple  Taylor  expansion,  as  follows. 
Assume  /  and  g  have  one  and  two  continuous  derivatives,  respectively.  Then  it  is  reasonable 
to  suppose  /  and  g  to  satisfy  those  smoothness  conditions.  Since  g'(zo)  =  =  0  then 


0  =  9  (*o)  =  £'(*o)  +  (So  -  Zo)g"(4)  =  9'(*o)  +  (zo  -  *o)£"(4) 


where  id  lies  between  z0  and  io-  Therefore 


(1.1) 


‘0  —  ^o  —  “{^(^o)  —  9'{zo)}  /  9"(  4 )  • 


Likewise,  since  /(x0.  r0)  =  /(io,  io)  =  t0  then 

To  =  /( io,  io)  =  /(* o,io)  +  (io  -  *o)/<1,0)(i<j,  io) 

=  t0  +  /(x0,zo)  -  /(* o,*o)  +  (io  -  *o)/(0,1)(*o,  zo)  +  (to  -  T0)/(1,0)(i<),i o)  , 
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where  ?q  lies  between  To  and  To,  and  £q  lies  between  z0  and  £0-  Therefore 

(1.2)  To  -  To  =  -{/(To, To)  -  /(T0,^o)}//(1'0)(TS,io) 

-(^-To)/(M)(To,iS)//(1’0)(TS,£o). 

From  equations  (1.1)  and  (1.2)  we  conclude  that  (i)  if  ^2^(*o)  is  nonzero  then  zo  can  be 
estimated  with  the  same  accuracy  as  g^\zo)\  and  (ii)  if  /(1’°)(t0,z0),  /{0,1)(x0,zo)  aJid 
g(2\zo)  are  nonzero  then  To  can  be  estimated  with  the  worst  of  the  accuracies  with  which 
/(to,-*o)  and  <7^(zo)  can  be  estimated.  In  the  pathological  event  that  one  or  other  of 
these  functions  should  be  zero,  higher-order  Taylor  expansions  must  be  investigated. 

Thus,  estimation  of  To  and  zo  reduces  to  estimation  of  /,  g  and  derivatives  of  those 
functions.  Inference  about  the  mean,  /,  is  a  classic  nonparametric  regression  problem,  but 
not  so  inference  about  the  variance,  g.  We  require  an  estimate  of  the  mean  before  we  can 
estimate  the  variance,  and  interest  centres  on  the  effect  which  not  knowing  /  has  on  our 
ability  to  estimate  g. 

We  now  discuss  convergence  rates  obtainable  from  (1.1)  and  (1-2).  Suppose  /  has  v\ 
derivatives  and  g  has  i/2  derivatives.  We  allow  zq  and  zq  to  be  arbitrary  positive  numbers, 
since  fractional  derivatives  may  be  expressed  in  terms  of  Lipschitz  conditions.  (See  the 
second  paragraph  of  Section  2  for  definitions.)  The  argument  leading  to  (1.1)  and  (1.2) 
requires  at  least  one  derivative  of  /  and  two  derivatives  of  g,  and  so  we  assume  here  that 
zq  >  1  and  zq  >  2.  In  Sections  2  and  4  we  shall  use  (1.1)  and  (1.2)  to  show  that  kernel-type 
estimators  achieve  convergence  rates 

(1.3)  |f o  -  to |  =  Op{max(N-''^Vl+1\  A'-(,/j-1)/(2,/j+1))}  , 

(1.4)  | fo  -  to  I  =  , 

where  N  denotes  the  number  nf  pairs  of  signal  factors  and  control  factors  in  our  sample. 
The  first  contribution  to  the  right-hand  side  of  (1.3)  is  due  to  the  possible  effect  of  not  know¬ 
ing  /.  When  zq  >  1  and  zq  >  2,  not  knowing  /  has  no  effect  on  the  accuracy  with  which  we 
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can  estimate  zq,  but  does  influence  the  accuracy  with  which  we  can  estimate  xq.  A  neces¬ 
sary  and  sufficient  condition  for  the  right-hand  side  of  (1.3)  to  equal  Op(AT-(*'I~1)/,(2‘'5+,)). 
and  so  for  there  to  be  no  penalty  in  not  knowing  /,  is  i/j  >  (2/3)(i/2  -  1). 

We  shall  prove  in  Section  3  that  the  rates  of  convergence  described  by  (1.3)  and  (1.4) 
are  optimal,  in  the  sense  that  under  the  stated  smoothness  assumptions,  no  nonparametric 
estimator  can  achieve  faster  rates. 

Result  (1.2),  which  leads  to  rates  of  convergence  for  estimates  of  20 >  requires  only 
v2  >  2  and  iq  >  0.  We  shall  show  that  in  this  general  circumstance,  the  best  achievable 
rate  of  convergence  of  any  estimator  of  zq  is 

(1.5)  |z0  -  *o|  =  Op{max(AT-(‘'3-1)/<2,,J+1),AT-'/l(‘/J-1)/{(,,1+1)*'3>)}  . 

For  small  t/\,  this  rate  is  inferior  to  that  described  by  (1.4)  unless  —  1 ) / { ( + 1)^}  > 
(v2  —  1)/(2i/2  +  1);  that  is,  unless  t/^  >  ^2/(^2  + 1)-  Of  course,  the  latter  inequality  is  always 
satisfied  when  1/1  >  1,  and  in  that  case  (1.4)  and  (1.5)  axe  identical. 

Most  of  our  attention  will  be  devoted  to  the  case  of  an  experiment  of  fixed  design, 
defined  by  model  (2.1)  in  Section  2.  Fixed  design  is  more  realistic  than  random  design  in 
most  control  contexts,  and  is  amenable  to  complete  asymptotic  analysis.  Section  4  will 
outline  analogous  results  in  the  random  design  case.  Some  of  this  work  has  a  counterpart  in 
heterscedastic,  nonparametric  regression,  and  will  be  discussed  elsewhere  in  that  context. 

In  some  applications,  our  model  (2.1)  applies  only  after  a  data  transformation  of  the 
response  variable.  Our  discussion  still  applies  for  the  closeness-to-target-problem.  by  using 
approximations  suggested  by  Box  (19S7);  see  his  equation  (15). 

2.  Fixed  design  case 

In  the  fixed  design  case  our  model  is 

Yij  =  /(*/«, j’/n)  +  9(j/n)*€ij  ,  1  <  *\j  <  n  ,  (2.1) 

where  the  el} ’s  are  independent  with  zero  means,  unit  variances  and  uniformly  bounded 
fourth  moments.  We  observe  the  data  set  {Fiy,  1  <  1,  j  <  n},  and  wish  to  estimate  /,  g 
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and  their  derivatives.  Note  that  there  sire  N  =  n2  observations,  not  n;  this  is  important 
when  comparing  our  results  with  those  in  classical  nonparametric  regression  problems. 

Let  v  >  0,  and  write  (u)  for  the  largest  integer  strictly  less  than  v.  A  univariate 
function  g  is  said  to  be  i/-smooth  if  it  has  (i/)  bounded  derivatives  and  if  satisfies  a 

Lipschitz  condition  of  order  v  —  (v): 

\gW)(x)-g^(y)\<C\x-yr^ 

for  all  x,y  6  (0,1).  A  bivariate  function  /  is  said  to  be  I'-smooth  if  f^''^\x,y)  exists  and 
is  bounded  for  all  i  >  0,  j  >  0  satisfying  i  +  j  <  ( v },  and  if 

I /(i  v)  -  f^\x,  y)|  <  C(\u  -  *r  <">  +  |r  -  y  r  <">) 


for  all  v.  v,  x,  y  €  (0,1)  and  all  i  >  0,  j  >  0  satisfying  t  +  j  =  (i/).  We  assume  that  in  model 
(2.1),  the  bivariate  mean  function  /  is  V\ -smooth  and  the  univariate  variance  function  g  is 
i/2  -smooth. 

Our  estimates  of  /  and  g  are  based  on  fixed-design  analogues  of  kernel  sequences 
which  may  be  defined  as  follows.  Given  0  <  ftj,  h2  <  1,  and  nonnegative  integers  r,  s  and 
f.  let  {aj,(hi),  — oo  <  k  <  oo},  {6*(/ii),  — oo  <  k  <  oo}  and  {ck{h2),  —  oo  <  k  <  oo)  be 
sequences  of  constants  satisfying 


M  <  Ch\+ 1  ,  )bk\  <  Ch?'  ,  |c*|  <  Ch'2+1  ,  ak  =  6*  =  0  if  |*|  >  Ch r1  , 


(2.2)  c*  =  0  if 


\k\>Ch?  ,  E^°t  =  {o 

k  *■ 


if  j  =  r 

if  0  <  i  <  (f  i }  and  i  ^  r  , 


(  s',  if  i  =  s  (  t\  if  i  =  t 

Y  k'bk  =  <  0  if  0  <  t  <  (i/j)  Y  *<c*  =  \  0  if  o  <  t  <  {v2) 

k  l  and  i  ^  s  ,  k  l  and  i  ^  t  . 

The  constant  C  does  not  depend  on  h\  or  h2. 

To  construct  {a*}  for  example,  let  K  be  a  compactly  supported,  real-valued,  r-times 
continuously  differentiable  function  satisfying  J  u'K(u)du  =  1  if  i  =  0,  0  if  1  <  i  <  M- 
Put  L(u)  =  (  —  l)rA'(r)(u).  Then  f  u'L(u)du  —  r!  if  t  =  r,  0  if  0  <  *  <  (i/j)  and  i  ^  r.  A 
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slight  adjustment  of  L,  taking  account  of  errors  in  series  approximations  to  integrals  and 
giving  the  function  L\  say,  ensures  that  a *  =  h[+1  k)  has  the  desired  properties. 

Our  estimator  of  /(r’^  is 

(2.3)  f{r''\i/n,j/n)  =  nr+'  £  £ tkhYi+tj+i  , 

k  t 

where  Y{j  is  defined  to  be  zero  if  one  or  other  of  i,j  is  less  than  one  or  greater  than  n. 
Basic  properties  of  fY<3)  axe  described  by  the  following  theorem. 

Theorem  2.1.  Assume  /  is  i ^-smooth,  v\  >  r  +  s,  g  is  bounded,  sup E(e2i})  <  oo,  and 
/ij  =  hi(n)  satisfies  hi  — »  0  and  nhi  — *■  oo.  Then  for  each  0  <  6  < 

(2.4)  sup  \Ef^’\i/n,j/n)  -  j/n)\  =  0{(nh1)-<*'*-’-'>}  , 

6n<i,j<(l  —  <)n 

(2.5)  sup  var {/(r’*)(»/n, j/n)}  =  0{(nhi)2(r+4)hj}  . 

1  <t  J<n 

REMARK  2.1  Given  any  (x,  y)  €  (0,  l)2,  we  may  define  /^r,,^(x,  y)  by  linear  interpolation 
among  the  four  vertices  of  the  integer  square  containing  (x,y).  It  is  easily  shown  that 
analogues  of  (2.4)  and  (2.5)  hold  for  this  “more  general”  estimator: 

sup  |£/>-'>(x,z) -/<”•' W)l  =  0{(nh1)-{"‘-r-,)}  , 

6<z  ,t<l—6 

sup  var {/(x, z)}  =  0{(nh,)2<r+'>h2}  . 

0<r  ,j  <  1 

REMARK  2.2  It  follows  from  Theorem  2.1  that  the  mean  squared  error  of  is 


(2.6)  £{/>’*>(, 7nJ/n)  -  /n.j /n)}2  =  0{(nh,) 


-Zft'i-r-*) 


+  (nh1)2(r+»)h2}  , 


uniformly  in  6n  <  i,j  <  (1  —  6)n.  The  order  of  magnitude  of  the  right-hand  side  is 
minimized  at  0(n-2^‘-r-,>/(*'I+1))  =  0(N~{,/> -'—-)/(•'! -t-D )  by  taking  hj  = 

By  modifying  techniques  of  Stone  (19S0)  we  may  show  that  the  rate  0(jV“(,'1~r~*^^‘'1+1^) 
is  optimal  in  a  minimax  sense,  where  the  maximum  is  over  the  class  of  i/] -smooth  functions 
having  a  given  constant  C  in  the  Lipschitz  condition  and  in  bounds  on  derivatives. 
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If  we  knew  /  we  could  form  the  “true'’  residuals  rtj  =  Y,j  -  f(i/n,j/n)  =  g{j/rt)^etJ, 
and  construct  an  estimator  g ^  of  g as  follows: 

(2-7)  gU)(j/n )  =  n'-1  ckrl,}+k  ■ 

i=l  k 

Here  is  defined  to  be  zero  if  j  <  1  or  j  >  n,  and  {c*}  is  as  in  (2.2).  An  argument 
similar  to  that  employed  to  prove  Theorem  2.1  may  be  used  to  establish: 

Theorem  2.2.  Assume  g  is  v2-smooth,  v 2  >  t,  E(e*j)  is  uniformly  bounded,  and  h2  = 
h2(n)  satisfies  h2  — ►  0  and  nh2  — »  00.  Then  for  each  0  <  6  < 

sup  | Eg(t)(j/n)  -  g{,)(j/n)\  =  0{(n/i2)_(,/s_,)} 

6n<j<(  1  -£)n 

sup  var  {g(t){j/n)}  =  0{(n/i2)2<_1  h2}  . 

1  <j<n 

REMARK  2.3  It  follows  from  Theorem  2.2  that  the  mean  squared  error  of  g satisfies 

(2.8)  E{ gl%/n)  -  g{t)(j/n)}2  =  0{(nft2)-2(^-‘>  +  (nh2)2t~'h2}  . 

The  right-hand  side  here  is  minimized  by  taking  h2  =  n_*2l',-1^2*'J  +  1\  giving  a  mean 
square  error  of  0(n-4^*'s_,^^2*'3+1))  =  0(Ar-2(*',_<^2*'l+1*).  Again,  this  rate  is  optimal  if 
/  is  known.  However,  we  pay  a  penalty  for  not  knowing  /.  as  Theorem  2.3  below  shows. 

Replace  the  true  residual  r,y  by  its  estimate  *H  =  Ki  —  f(i/n,j/n),  giving  rise  to  the 
following  practical  estimator  of  g 

(2  9)  g(,)(j/n)  =  nt~i  ^  ckf2}+k  . 

1=1  k 

Theorem  2.3.  Assume  /  is  v^-smooth,  g  is  u2-smooth.  v2  >  t,  E{(.AtJ)  is  uniformly 
bounded,  and  hi  =  h{(n)  satisfies  hi  — ►  0  and  nh,  —*  oc  for  i  =  1,2.  Then  for  each 

0  <  6  <  i, 


(2.10)  sup  E{g^(j/n)  -  g^\j/n)}2  =  OHinh,)-2^-^  -b  (nh2)2,~1h2} 

tn<j<(  1  -6)n 

+  (nh2)2t{(nh1r2,''+h2}2}. 
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REMARK  2.4  The  order  of  the  mean  squared  error  of  is  that  of  the  mean  squared 
error  of  g^\  plus  ( nh2 )2t  times  the  square  of  the  mean  squared  error  of  /;  compare  (2.8) 
and  (2.10),  noting  result  (2.6)  for  r  =  s  =  0.  The  additional  term  represents  the  penalty 
in  not  knowing  /  when  estimating 

REMARK  2.5  The  value  of  /ij  which  minimizes  the  order  of  the  second  term  on  the 
right-hand  side  of  (2.10),  is  h\  =  Using  this  value  of  hj  we  find  that 

(2.11)  E{s«>(j/n)  -  j^OVn)}’  =  0[{(n/.2)-J(--"  +  (nh2f-' »»} 

The  value  of  ^2  which  minimizes  the  order  of  A(h2)  =  +  (nA2)2<~J  is  h2  = 

h*2  =  D/(2^+i))  ^  =  2n-4(«'>-0/( 2^+1)  Furthermore,  (nfc$)2‘rc-4*'>/<,'H 

<  A(/j5)  if  and  only  if 

(2.12)  i/\  >  v2((v2  +  1)  • 

Therefore  when  (2.12)  is  true,  the  term  involving  h\  on  the  right-hand  side  of  (2.10)  does 
not  influence  the  convergence  rate  of  the  optimally  constructed  version  of  g^K  and  for 
h  1  =  hi  and  h2  =  hi, 

E{g(t\j/n)  -  g{t\j/u)}2  =  0(n -^-O/C^+D)  . 

This  is  the  same  as  the  best  rate  of  convergence  of  g^\  see  Remark  2.3. 

REMARK  2.6  If  (2.12)  fails  then  there  is  a  cost  to  estimating  /.  An  optimal  balance 
among  terms  on  the  right-hand  side  of  (2.11)  is  achieved  by  making  (nh 2)-2(*/2-,)  the  same 
size  as  (nh2)7tn~A‘'' /(•'i  +  U  That  is.  take  ^2  =  hi*  =  n{2*'i-M*'i  +  i)}/{(»'i+i)i/i) ,  in  which 
case 

E{g{t\j/n)  -  g(t){j/n)}2  =  O/K-'.+D^})  . 

REMARK  2.7  Note  that  hi*  (the  optimal  version  of  h2  when  (2.12)  fails)  is  different 
from  h\  (the  optimal  h2  when  (2.12)  holds).  Also,  none  of  A*,  hi  and  hi *  depends  on  t. 


\ 
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REMARK  2.8  We  may  summarize  the  main  points  made  during  Remarks  2.5  and  2.6  by 
stating  that  if  g is  constructed  using  hi  =  h‘  and  h2  =  h j  (if  (2.12)  holds)  or  h 2  =  hj* 
(if  (2. 12)  fails),  then 

(2.13)  E{g{,)(j/n)  -  g(1\j/n )}2  =  0{max(n-4(*,J-')/(2‘'J+3),n“4*'‘(*'I-,)/{(‘/‘  +  3)‘,j}))  . 

The  term  involving  only  v2  dominates  the  right-hand  side  here  if  (2.12)  holds,  while  the 
other  term  dominates  if  (2.12)  holds  We  shall  show  in  Section  3  that  the  rate  of  conver¬ 
gence  described  by  (2.13)  is  optimal  in  a  minimax  sense. 

To  solve  the  first  part  of  our  control  problem  we  need  to  estimate  that  value  z0  which 
minimizes  g.  If  g  has  a  continuous  derivative  then  this  amounts  to  estimating  the  solution 
r0  of  the  equation  g^l\z)  =  0.  A  potential  estimator  g^(z)  of  <7^(r)  may  be  obtained  by 
interpolating  among  values  of  j^^(j/n),  defined  at  (2.9).  However,  this  approach  results 
in  a  very  rough  estimator,  without  even  a  single  continuous  derivative.  There  are  several 
ways  of  deriving  a  smoother  estimator.  One  is  to  derive  <)(2)(z)  by  linearly  interpolating 
among  values  of  g^(j/n),  and  then  estimate  by  integrating  £(2h  This  we  do  below. 

Define  <7(1)0"/n)  and  ff(2)(j/n)  as  at  (2  9),  construct  5(2)(r)  by  linearly  interpolating 
among  points  g(2)0/n):  and  for  an  arbitrary  jo  satisfying  jo  ~  no.  some  0  <  a  <  1,  put 

g(1)(z)  =  £(1)C?o/n)+  /  <7(2,(u)</u,  0  <  r  <  1  . 

Jjo/n 

This  will  be  our  estimator  of  g^(z).  It  is  continuously  differentiable,  with  derivative 
(g(1))'(z)  =  g<2)(z),  and  is  a  quadratic  interpolation  of  an  estimator  'like”  g(1K  It  shares 
the  mean  squared  error  properties  of  as  follows. 

Theorem  2.4.  Assume  the  conditions  of  Theorem  2.3.  with  t  =  1.  Then  for  each  0  < 
6  < 

(2.14) 

sup  £{j0,(z)  -  ff(I)(2)}2  =  0[{(nh2)-2(,',_1)  +  nhl)  +  {nh2)7  {{nh1)~2‘'1  +  h]}7]  . 

6<z<l-6 
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REMARK  2.9  Note  that  the  right-hand  sides  of  (2.10)  (for  t  =  1)  and  (2.14)  are  identical. 

REMARK  2.10  The  conditions  in  Theorem  2.4  do  not  require  the  existence  of  a  second 
derivative  of  g,  even  thc  gh  g ^  is  used  in  the  construction  of  We  need  only  assume 
v2  >  1;  of  course,  g ^  is  well-defined  without  any  smoothness  assumptions,  being  given  by 
formula  (2.9). 

We  are  now  in  a  position  to  solve  the  first  part  of  our  control  problem.  Let  £o  be  any 
solution  of  the  equation  <7^(z0)  =  0,  and  zo  be  the  unique  solution  of  g^H2o)  =  0-  Then 

(2.15)  0  =  g(l)(zo)  =  g{z 0)  +  (z0  -  zo)g(2){zo  +  6{zo  -  20)}  , 

where  0  <  6  <  1.  Assume  g  is  i/2-smooth  for  some  v2  >  2.  Then  g (2)  is  well-defined  and 
continuous.  Suppose  that  for  an  integer  /  >  1,  4Tth  moments  of  the  errors  e.y  sue  uniformly 
bounded.  Then  the  argument  leading  to  Theorem  2.3  may  be  generalized  to  prove  that 
for  each  0  <  £  < 

sup  E{gi2)(j/n)  -  g(2){j/n)}2‘  =  0{B2(huh2)1}  , 

£n<J<(l-6)n 

where  Bt(h\,h2)  =  {(nh2  )~2(*/3_<l  4-  (nft2)2*-1  h\ }  +  (n/?2)2,{(7?^i  )~2"1  +  h\}2 .  Choose 
hi,  h2  to  minimize  the  order  of  Bi{hi,  fi2),  as  described  in  Remark  2.8.  Then  B2(h\.  h2)  = 
0(n~b)  where  b  =  min[4(i/2  —  2)/{2u2  +  1),4u1(i>2  —  2)/{(i'J  +  l)i/2}]  >0-  If  l  >  1/6  then 
for  each  rj  >  0  and  each  0  <  6  <  we  have  by  Markov’s  inequality, 

P{  sup  |ff(2)0'/n)  ~  5(2)(l/n)l  >  »?}  =  0(n1_M)  =  o(l)  , 

6n  <j<(1  -£)n 

so  that 

(2.16)  sup  |ff(2)(tr)  -  p(2)(r)l  =  op(l)  . 

*<2<l-6 

Therefore  by  (2.15),  assuming  that  c(2)(z0)  ^  0, 


z0  -  z0  =  -{1  +  Op(l)}{p(I)(ri>)  -  P(I)Uo)}/p(2)(*o)  . 
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We  conclude  that  z0  converges  to  c0  at  the  same  rate  as  <7^(20)  converges  to  9^(20):  that 
is. 


(2.17)  -  z0\  =  Op{mBx{n-2{v*-VIV"'+i\n-2v'(v'-')lUv'+')v^)}  . 

This  is  result  (1.5),  announced  in  Section  1,  and  implies  (1.4)  when  V\  >  1. 

The  second  part  of  our  control  problem  consists  of  estimating  the  value  x0  which 
satisfies  f(x0,zo)  =  to-  An  estimator  of  /  is  /  =  defined  at  (2.3)  with  r  =  s  =  0. 

However,  as  in  the  case  of  our  estimator  of  this  suffers  from  being  “too  rough". 
Therefore  we  compute  Z^0,1^.  /W°)  and  fti.i)  by  linearly  interpolating  among  values  defined 
at  (2.3),  and  then  derive  an  estimator  f  of  f  by  integration,  as  follows.  Let  t'0,  j0  satisfy 
i"o  ~  no,  j0  ~  n/3  where  0  <  a,/3  <  1,  and  put 

(2.18)  f(xyz)  =  f(i0/n,j0/n)+  f  f{i'0)(uj0/n)du+  f  fl°A)(i0/n,v)dv 

Jio/n  dj0/n 

4-  /  du  (  /(1,1)(u,t’)c?u  ,  0  <  1,2  <  1  . 

Jio/n  Jjo/n 

This  will  be  our  estimator  of  f(x,z).  It  is  continuously  differentiable  in  both  variables, 
satisfying 

{d/dx)f(x,z)  =  f(1'0){x,j0/n)  +  /  /(1,1)(x,  v)  dv 

J  jo  /  n 

and  an  analogous  expression  for  ( d/dz)f(x ,  2).  It  shares  the  mean  squared  error  properties 
of  /(0  0^,  as  follows. 

Theorem  2.5.  .Assume  the  condition  of  Theorem  2.1,  with  r  =  s  =  0.  Then  for  each 

0  <  <5  <  i, 

(2.19)  sup  £{/(*, 2)  -  /(a-,2)}2  =  0{(nhi)-2"'  +  h\}  . 

t<2<! -t 


REMARK  2.11  Note  that  the  right-hand  sides  of  (2.14)  (for  r  =  s  =  0)  and  (2.19)  are 
identical. 
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REMARK  2.12  1  heorem  2.5  does  not  require  the  existence  of  any  derivative  of  /,  even 
though  numerical  values  of  /(0’I),  /(ll°)  and  are  used  in  the  construction  of  /. 

We  are  now  in  a  position  to  solve  the  second  part  of  our  control  problem.  Suppose 
/  is  i/j -smooth,  where  v\  >  1.  Then  /(0,U  and  are  well-defined  and  continuous. 

Assume  f^0,1\x0,  zo)  ^  0  ^  /^1,0)(x0,  z0).  Define  /  as  at  (2.18),  and  write  /bJ")(x,x)  for 
(d/dx)'(d/dzy  f(x,  z).  Choose  h\  =  to  minimize  the  order  of  (n/jj)-2*'1  +  h\. 

Then  by  (2.19), 

(2-20)  |/(*o,*o)  "  /(*0,*o)|  =  Op(n-‘'*/(l'>+1))  . 

Suppose  that  for  an  integer  /  >  1,  2/’th  moments  of  the  errors  etJ  are  uniformly  bounded. 
The  argument  leading  to  (2.16)  may  be  modified  to  show  that  if  1  is  sufficiently  large  then 
for  each  0  <  6  < 

(2.21)  sup  |/(l>»(x,r)-/C-»(x,r)|  =  0,(1) 

6<z,z<\~6 

for  (b  j)  =  (0,1)  or  (1,0).  Using  the  Taylor  expansion  which  produced  (1.2)  we  may  now 
deduce  that 

*o  ~  *o  =  -{1  +  *,(!)} {/>o,  zo)  -  f(x o,  *0)}//(1’0)(ro,  *o) 

-  {1  +  o„(l)}(f0  -  z0)f{0-'\x0.  z0)/f{1'0)(x0,zo)  ■ 

We  conclude  that  the  rate  of  convergence  of  xc  to  x0  is  the  worst  of  the  rates  of  convergence 
of  /(x0,Zo)  to  f(x0,z0)  and  of  z0  to  z0.  By  (2.17)  and  (2.20),  this  is 

|x0  -  x0|  =  Op{max(n",'l/(,/l  +  1),n"2(,/J~n/(2‘/!+1).n_2l/,(,/J_1)/{(*'»  +  ’)^))} 

=  Op{max(n-,'l^,'1+Oin"2(,'J~1)/(2,'j-H))}  , 

the  second  identity  following  from  the  fact  that  rq  >  1.  This  is  result  (1.3).  announced  in 
Section  1. 


PROOF  OF  THEOREM  2.1  We  begin  with  a  lemma. 


12 


LEMMA.  Let  m  >  0.  Suppose  the  bivariate  function  f  has  continuous  derivatives 
Lr  i  >  0,  j  >  0  and  i+j<m,on  the  square  [0,  l]2.  There  exist  numbers  0t] .  6l2  satisfying 
0  <  8ij  <  1,  such  that 

f(ui  +  6uu2  +  62)  =  (^^2/l!i!)/(,'>)(wi,W2) 

0<i+j'<m  — 1 

+  EEw  *i/mf{iJ\«i+eii6uu3  + 

i+2= m 

whenever  ux,u2,  ui  4  &i,u2  +  62  €  [0,1]. 


To  prove  the  lemma,  write  /(a,  +  Suu2  +  S2)  =  {f(m  +  S1,u2  +  S2)  -  f{uuu2  + 
<$2)}  +  /(tii,  «2  +  62),  and  repeatedly  apply  the  univariate  version  of  Taylor’s  theorem  with 
remainder. 

To  prove  (2.4),  put  m  =  (u2)  and  apply  the  lemma,  obtaining  for  integer  a  and  /?: 
£{/(r’4)(a/n,/?/n)  -  f^\a/nj/n)} 

=  "r+' ££<">*'  £  £  -/"•'>  (|.  £) } 

=  °{nr+*££  ££l<‘/n)i('/n)'<i.M(H7"r-”  +  =  o^r— ■■) . 

k  l  «+J=m  J 

To  prove  (2.5),  observe  that 


var {/<'•'>(, /n„,yn)}  =  0{nJ<'+->(^<.n(£‘J)}  =  0{(nM*r+',l>?>  . 


PROOF  OF  THEOREM  2.3  Take  r  =  s  =  0,  in  which  case  we  may  assume  o/  =  £>/ 
and  our  estimator  of  /  is 


Put  A ij  =  53/,  53/,  Q(,a/j9{(i  +  k)/”1  -%+/,.>+/,  and 


>/{(*  +  h)/n,(j  +  /2)/n)  -  f(i/n,j/n )  . 

h  h 
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In  this  notation,  f,j  =  r,j  —  A; j  -  B{j,  so  that  n1  tg^(j/r>)  =  n1  ‘<7^0/n)  —  2A3  +  Bj 
where 


Aj  =  y  +  Bi,j+k)rij+kCk ,  Bj  =  y y^(A,- +  j^,,-*-*)2^  ■ 

1=1  k  1=1  k 

Therefore,  in  view  of  (2.8),  it  suffices  to  prove  that 

(2.22) 

E{A))  =  0[{rth'2)*{(nh1)-**'  +  h\Y  +  *22(<+1)]  ,  E(B])  =  0[(nft‘)2{(nft1r2‘/l  +  *?}*]  - 


Since  >!_,  =  +  -Bij+tMO’  +  *)/n}*cM+*c*  then 


(2.23) 


^(^)  <  E 


y  y  Aij+kCij+tgiU  +  *)/»»} 

■  i  k  ■* 

EE  *)/n)  I 


+  E 


i  k 


Now, 

+  C«l  J+k\  + 

=  ^2--J2ah . '«[n^+/Q)/n} 

h  .  -  .h  *-0=1 

*  -E(f  ii  +lj  ,j+ti  ,j+ti  ^>j  +  tj,i+fcj  +  i«^*3  ,i+ ^3  )=;0{*id'*i-f(ll  *2)^1  *2)}  • 


Hence  the  first  term  on  the  right-hand  side  of  (2.23)  equals 

0  E  •  •  •  E  {M  +  hV(ii  =  «2,  *1  =  fca))fcJ(*+,)i’(|*nU  1*2 1  <  Ch? ) 

■  *1,^3 


=  0{(nh‘)2h< +nh2/i’,+1}  =  0{(nh')2ht +  h2(,+1)}  . 

Since  |S,j|  =  0{(n/j1)_,/1 }  then  the  second  term  on  the  right-hand  side  of  (2.23)  equals 

£('n)EE5MtU{0  +  *)/»)4  =  0{(n/ll)_2,/,n*r+1} 

«  k 

=  0{(nfi2)2(n*i )_4l/'  +  h^<+5)}  • 


Combining  estimates  from  (2.23)  down  we  get  the  first  part  of  (2.22).  The  second  part 
follows  from  the  fact  that  jc*|  <  C/i2+1J(!*l  <  Ch J1),  E(A*j)  =  0(h*)  and  = 

0{(nhtr^b 
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PROOF  OF  THEOREM  2.4  If  j/n  <  u  <  (j  +  1  )/n  then 

ff(2)(n)  =  (nu  -  j)gi2){(j  +  1  )/n}  +  (j  +  1  -  nu)g(2)(j/n)  , 

whence 

/O+l)/" 

/  g(2)(n)(fu  =  (2n)_1(p(2)(j/rj)  +  j(2){0'  +  l)/n}]  . 

Jj/n 

Therefore  if  j /n  <  z  <  ( j  +  l)/n  and  j  >  jo  +  2, 

5(1)(^)  =  9(1)tio/n)  +  n"1  p(2)(t/n)  +  Tj  +  T2 

»=>0  +  l 

=  j(I)Oo/n)  +  ff*(1){0'  -  l)/n}  -  9mll)Uo/n)  +  T,  +  T2  , 

where 

9‘0)ti/n )  =  ^2  S  ’  d*  =  Y,  Ck+‘  » 

1=]  t  /=o 

Ti  =  [  g{2\u)du,  T2  =  (2n)_1{g(2)0'0/n)  +  9W{j/n))  . 

Jj/n 

If  {c*}  satisfies  condition  (2.2)  with  f  =  2  then  {rf*}  satisfies  the  same  condition  (stated 
there  for  {c*})  with  t  =  1.  Therefore  Theorem  2.4  will  follow  from  Theorem  2.3  if  we  prove 
that  for  i  =  l  and  2, 

(2.24)  E(T2)  =  0[{nh2)-2^-"  +  nh\  +  (n^)2^,)-2"1  +  A2}2]  • 

(The  case  of  j  values  with  j  <  j0  +  1  may  be  treated  similarly.  Note  that  we  may  not,  and 
do  not,  assume  existence  of  </2\) 

Observe  that 

E(T2)  <  n~ 2  sup  £{£(2>(U)2}  <  2n-2  max  E{g{2)(l/n)2}  . 
i/"<"<0+l)/n  t-iJ+i 

Let  Aj,  B}  be  as  in  the  proof  of  Theorem  2.3,  this  time  with  t  =  2.  Then  <7(2)(//n)  = 
g(2)(//n)  +  n(Bi  -  2 A/),  and  (  as  shown  during  our  proof  of  Theorem  2.3)  E(A?)  +  E(Bf)  = 
0[(n/i2)2{(nfc,)-2,'>  +hl)*  +  hl}.  Also, 

-.-*£<  j<J>(f/n)2}  <  16[n-JE{  $<»(»/..)’}  +  BU If)  +  , 
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and  since  y(2)(//n)  =  n  Si  S*  +  A-)/n}e2i+)t  then 

r{g(2)(//n)2}  =  var{g(2)(//n)}  +  {Eg(2)(l/n)}2 

=  0[n3Y^<?k  +  \n2Ylck9{V  + k)/n)\2] 

k  k 

=  0{n*hf  +  (nfc2)2(2-'*)}  . 

Combining  ail  these  estimates  we  conclude  that  for  i  =  1  and  2, 

E(T2)  =  0[nh l  4-  (nh2)-2^-»h2  +  (nh2)2{(n/ll)'2''1  +  fc?}2]  , 
from  which  follows  (2.24). 

3.  Optimal  rates  of  convergence. 

In  this  section  we  show  that  the  convergence  rates  derived  in  Section  2  for  kernel-type 
estimators  cannot  be  improved  upon  by  other  estimators.  Our  optimality  results  will  be  in 
the  form  of  “worst  possible”  rates  computed  over  function  classes.  It  is  a  trivial  matter  to 
obtain  the  same  rates  for  our  kernel-type  estimators  by  extending  arguments  in  Section  2. 
In  the  next  paragraph  we  define  the  function  classes  and  state  the  extended  results. 

Given  positive  numbers  uj,  v2  and  B,  let  Cj  =  Ci(i/i,B)  be  the  class  of  bivariate 
functions  /  on  [0,  l]2  for  which  sup  |/t,,J0|  <  B  whenever  i  >  0,  j  >  0  and  i  +  j  <  (i/2);  and 

| f(iJ)(u,v)  -  f{iJ)(x,y)\  <  B(|u  -  +  \v-  y|"' -<"■>) 

whenever  ti,u,x,y  €  [0,1],  i  >  0,  j  >  0  and  i  +  j  =  (vi).  Let  C2  =  C2(v2,B)  be  the  class  of 
nonnegative  univariate  functions  g  on  [0, 1]  for  which  sup  <  q  whenever  0  <  i  <  (v2) 
and 

| g^ix)  -  5(<"*»(y) |  <  B\x  -  yr-<"’> 

whenever  x,y  G  [0,1].  Let  C3  =  03(B)  be  the  classs  of  nonnegative  univariate  functions 
g  on  [0,1]  such  that  supy  <  B.  Take  gW  and  y(0  to  be  the  estimators  defined  at 

(2.3),  (2.7)  and  (2.9)  respectively,  calculated  by  linear  interpolation  at  points  which  are 
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not  integer  multiples  of  n-1.  fSee  Remark  2.1.)  Assume  that  >  r  -f  s  and  >  i.  For 

appropriate  choices  of  the  smoothing  parameters  hj  and  h2,  and  for  each  0  <  S  <  i,  there 

exist  positive  constants  Ci,  C2  and  C3  depending  on  iq,  t/2  and  B  such  that 

sup  sup  £7/,9{/(r>,)(i,z)  -  /(r-')(j-,z)}2  <  Cin-2(*'1-r-,)/(*',+3)  , 

/€Ci,?£Cs  «<*,*<!-« 

sup  sup  £9{5(,)(z)  -  50)(^)}2  <  C2n-4^s-‘)/(2*'s+1)  , 

gtC} 6<*< 1 —6 

sup  sup  £:/i?{p(<)(z)  -  g(<)(z)}2 

/€Ci,y6C2 

<  C3  max(n-4(,,J-t)^(2*'J+1\ 

These  results,  but  without  the  suprema  over  /  and  g,  were  obtained  in  Remarks  2.2,  2.3 
and  2.8  respect ively.  The  methods  of  proof,  smoothing  parameters  and  convergence  rates 
are  exactly  the  same  in  the  present  uniform  context. 

In  this  section  we  show  that,  for  any  nonparametric  estimators  g ^  and 

(not  just  for  our  kernel  estimators),  the  above  inequalities  may  be  reversed.  Let  and 

gW  be  nonparametric  estimators  of  and  g ^  respectively,  based  on  model  (2.1).  and 

let  gW  be  a  nonparametric  estimator  of  based  on  the  true  residuals  r,;  =  ffO'/n)  s  e,;. 
1  <  i,j  <  n.  Assume  that  the  errors  e,9  are  independent  and  identically  distributed  as 
normal  AT(0, 1),  and  that  v\  >  r+s  and  1/3  >  t.  We  claim  that  for  any  fixed  (r0,  ro)  €  (0,  l)2 
and  arbitrary  nonparametric  estimators  and  g^\  there  exist  positive  contants 

D 1,  I)2  and  D3  such  that  for  large  n, 

(3.1)  sup  Bf,,{flr'’\xo,zo)  -  f^\x o,z0)}2  >  'VC'.+D  , 

/€Ci  tS6C* 

(3.2)  sup Eg{gW(z0)  -  ff(,)(z0)}2  >  D2 n-^-O/t^+U  % 

9€C, 

(3.3)  sup  £/,?{5(,)(ro)-ff(<)(ro)}2 

/€C  1  .y€Cj 

>  £>3max(n~4(*'5'<)/(2,'!+1),n-4,/l(*'J-,)/<(*'1  +  n,'J}) 

Results  (3.1)  and  (3.2)  may  be  viewed  as  lower  bounds  to  convergence  rates  for  esti¬ 
mation  of  mean  functions  in  nonparametric  regression  with  uniformly  bounded  variances. 
In  the  case  of  (3.2),  the  regression  is  replicated  n  times  at  each  design  point.  Both  results 
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may  be  derived  by  modifying  arguments  of  Stone  (19S0),  who  treats  lower  bounds  in  non- 
replicated  regression.  Result  (3.3)  is  more  difficult  to  obtain,  and  is  proved  in  detail  later 
in  this  section. 

Next  we  turn  attention  to  estimation  of  zq,  the  unique  element  of  [0,1]  such  that 
inf  g  =  g(z o).  The  rate  of  convergence  for  our  kernel-based  estimator  was  described  by 
(2.17).  To  extend  this  to  a  rate  uniform  over  a  function  class,  we  must  define  a  new  function 
class,  as  follows.  Fix  u2  >  2,  0  <  6  <  j  and  0  <  c  <  \B.  Write  V2  =  V2(t/2,6,B,c)  for  the 
class  of  nonnegative  functions  g  which  axe  in  C2(v2 ,  B)  and  which  satisfy  |c  <  g(7\z)  <  2c 
for  z  €  [0, 1],  <7(1*(ro)  =  0  for  some  z0  6  [^,  1  —  S\.  It  follows  that  each  g  €  V 2  is 
strictly  convex,  with  minimum  attained  at  its  unique  turning  point  zo-  Fix  v\  >  0  and 
let  C\  =  Ci(vi,B)  be  the  function  class  defined  earlier.  Then  if  zq  is  our  kernel-based 
estimator  of  z0,  and  if  {a„}  is  a  positive  sequence  with  an  — >  00, 

(3.4)  sup  Pf,g{\z0  -  rol  >  a„  max(n-2^*'1“1^^2'/J+1\n_2,/1^,'J“1^^‘'l+1^J^)}  — »  0 

/€C  1  ,9€X’j 

as  n  —*  00.  (Here  rq  >  0  and  i/2  >  2.)  This  is  a  version  of  (2.17)  uniformly  over  function 
classes,  and  is  proved  in  the  same  manner  as  (2.17).  To  state  a  converse  result,  let  Zn  be 
any  nonparametric  estimator  of  zq  and  {an}  be  any  positive  sequence.  We  claim  that  if 
(3.4)  holds  then  an  —*  00.  An  outline  of  the  proof  of  this  fact  will  be  given  later  in  this 
section. 

Similar  results  for  estimation  of  x0  require  a  new  class  T>\  of  mean  functions  /.  Fix 
d  £  (0,  yR),  iq  >  1  and  r0,  and  let  V\  =  Z>]( i/j  ,  6,  To,  B,  d)  be  the  class  of  functions  /  which 
are  in  C2(v2,B),  which  satisfy  \d  <  |/(0>1)(x,  z)|,  |/(1’0)(x,  z)|  <  2d  for  (x,z)  G  [0.  l]2,  and 
which  are  such  that  for  each  z  G  [£,  1  —  6]  the  equation  /(x,z)  =  r0  has  a  unique  solution 
x(z).  Then  if  x0  is  our  kernel-based  estimator  of  x0  =  x(z0),  and  if  {a„}  is  a  positive 
sequence  with  a„  — ►  00, 

sup  P/iff{|x0  —  x0|  >  an  max(n-*'I^*'1+1\  n-2(,/J_1^(2,/!+1h}  — *  0 

/€!>,.  g£V3 


(3.5) 


IS 


as  n  — ♦  oc.  (Here  ui  >  1  and  v2  >  2.)  Conversely,  if  xq  is  any  nonparametric  estimator  of 
x0,  if  {fln}  is  a  positive  sequence  and  if  (3.5)  holds,  then  an  — »  oo. 

We  conclude  this  section  with  a  detailed  proof  of  (3.3),  and  sketches  of  proofs  of  the 
rates  of  convergence  described  by  (3.4)  and  (3.5). 

Proof  of  (3.3).  It  is  notationally  simpler  to  assume  a  regular  design  on  the  square  [-1,  l]2 
instead  of  on  [0,  l]2,  and  to  take  x0  =  0.  There  is  no  loss  of  generality  in  confining  attention 
to  this  situation,  and  so  we  suppose  instead  of  model  (2.1)  that  YtJ  =  f{i/n,j/n)  + 
~n  <  hj  <  n,  "-here  the  e,/s  are  i.i.d.  Ar(0, 1).  Define  the  function  classes  Cj 
and  C2  on  [—1, 1]  instead  of  [0,1]. 

In  the  case  v\  >  v2/(y2  +  1),  we  must  prove  that  for  large  n, 

sup  Eftt{gM(T 0)  -  <7(0(*o)}2  >  . 

/€Ci  ,}€Cj 

This  inequality  follows  from 

(3.6)  sup  Efi9{gM(x0)  -  9(t\x0)}7  >  , 

f=o,gec7 

which  is  true  for  all  u2  >  t.  To  prove  (3.6),  note  that  when  /  =  0  our  model  entails 
Y,2  =  g(j /n)+T]ij,  where  17, j  =  g(j/n)(t7}  —  1).  This  is  a  replicated  regression  model,  having 
mean  function  g  and  residuals  with  uniformly  bounded  variance.  Techniques  of  Stone 
(1980),  giving  lower  bounds  to  convergence  rates  for  non-replicated  regression  models,  are 
easily  modified  to  produce  (3.6). 

When  i/j  <  v2/{v2  +  1),  we  must  show  that  for  large  n, 

(3.7)  sup  Ej,g{g(t\x 0)  -  g(,)(x 0)}2  >  . 

f€C\  ,y€Cj 

Our  first  proof  of  this  inequality  is  valid  for 

(3.8)  iq  <  v2/(u2  +  1)  ,  v2  >  max(f,l)  ,  <  =  0,1,... 

The  only  case  of  interest  not  covered  by  these  conditions  is 


(3.8) 


t'l  <  v2/(v2  +  1),  0  <  I>2  <  1  ,  <  =  0, 


19 


and  we  shall  treat  this  separately  at  the  end  of  our  main  proof. 

Assume  condition  (3.8).  Let  V’i,  ^2  be  real-valued  functions  having  at  least  (u2)  +  2 
bounded  derivatives  on  (  —  00,00),  such  that  1  vanishes  outside  [0.1],  xp2  vanishes  outside 
[—1,1],  ^1(2)  ^  O'  V4**(0)  7^  O'  and  supIV’J-0!  <  \B  for  0  <  t  <  {^2}  +  2  and  j  =  1.2. 
Fix  c  >  0  and  put  mj  =  [cn*'»/(*'1  +  1)],  m  =  (where  [x]  denotes 

the  integer  part  of  x),  m2  =  mjm  and  =  mj/n  for  i  =  1,2.  Let  mo  be  an  integer  such 
that  m0mi  <  n  and  mo  ~  n/m\.  Since  we  are  assuming  v2  >  1  then  v2j{y2  +  1)  <  jt^2< 
and  so  the  hypothesis  V\  <  v2/(y2  +  1)  entails  V\  <  ^u2.  This  implies  m  — ►  oo  as 
r.  -  ■*  00.  Let  {/,j,  -mo  <  i  <  m0  -  1  and  -m  <  j  <  m  -  1}  be  a  sequence  of  ±l’s,  put 
A(x,y)  =  5J'lV’i(x/^i)vAi(y/^2)'  and  define  /  =  /(•]  {/;>})  by 

/(x, y)  =  I,jA(x  -  n~imii,y  —  n~1m1j)i{  (x,y)  E  Jij  ,  /(*,y)  =  0  if  (x,y)  ^|j2"o  ’ 

•j 

where  lij  =  [n-1mit,n~1m1(t  +  1))  x  [n-1miji,n-1mi(j  +  1))  for  —  m0  <  i  <  m0  —  1  and 
—m  <  j  <  m  —  1,  and  where  U,_,  denotes  the  union  over  these  values  of  t,  j.  Write  T  for 
the  class  of  all  such  f's.  Let  G(x )  s  6i*'1  ^2(x/62),  go  =  1,  pi  =  (1  —  G)-1  and  <7  =  {<7o,Pi}- 
For  large  n,  T  C  Cj  and  C?  C  C2,  provided  B  >  1.  (The  latter  restriction  may  be  removed 
at  the  cost  of  notational  complexity.) 

Let  y*0(o)  be  any  nonparametric  estimator  of  p^(0).  It  suffices  to  show  that 

liminf  sup  E  {$(‘>(0)  -  g{,)(0)}2  >  0  . 

00  /€^,j€C 

This  result  will  follow  if  we  prove  that 

(3.10)  liminf  sup £!{p(,)(0)  -  g(,)(0)}2  >  0  . 

n-00 

where  E *  denotes  expectation  under  the  model 

V.j  =  /(l/n> j/n  I  Uo/j})  +  9{j/n)>tij  ,  -n  <  j,;  <  n  , 

in  which  the  Jq^’s  are  independent  symmetric  ±1  variables,  independent  of  the  e,y’s  which 
are  i.i.d.  AT(0, 1). 
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If  (3.10)  fails,  choose  a  sequence  {n*}  such  that  the  left-hand  side  of  (3.10)  converges 
to  zero  as  n  -*  oo  through  {n*}.  Since 

ISo°(0)  -s!°(0)|  ~  ~  const.  , 

then  the  decision  rule  D  given  by  D  =  0  if  |ff^(0)  —  ffo^(^)j  <  |<7^(0)  —  <?i^(0)|.  D  =  1 
otherwise,  provides  asymptotically  perfect  discrimination  between  ffo^(0)  and  <7^(0)  as 
n  —*  oo  through  {n*},  in  the  sense  that 


p;o(d  =  i)  +  p;i(d  =  o)^o. 


We  shall  complete  our  proof  by  showing  that  this  is  impossible,  even  for  the  likelihood 
ratio  (LR)  rule.  It  suffices  to  show  that  if  the  true  g  is  g0  then  the  chance  that  the  LR  rule 
picks  g i  is  bounded  away  from  zero  as  n  — ♦  oo.  We  may  confine  attention  to  the  LR  rule 
based  on  {V^,  |i|  <  m0m j  and  |j|  <  mm!}.  (Note  that  momi  ~  n,  and  go( x)  =  <7i(x)  for 
|x|  >  mmi/n.  Therefore  YtJ  with  |j|  >  mmi  provides  no  information  for  discriminating 
between  go  and  g j.) 

Let  a,  b , a, 0  be  integers  satisfying  — m0  <  c  <  m0  —  1,  —  m  <  a  <  m  —  1,  1  <  5,  3  <mj. 
If  i  =  ami  -f  b  and  j  =  amj  +  0,  write  Yabap  and  eakQg  for  Yt}  and  etJ ,  respective!}'.  For 
fixed  a,  a,  the  likelihood  of  {Yaba0  ,  1  <  b.  0  <  tt^}  is  proportional  to 


fexP  ^  ^{la6o/9  +  A{b/rx,  0 In)}2 / g{(ami  +  0)/n } 

'  b  0 

+  exp  +  A(b/n,P/n)}2/g{{arni  +  0)/n)  'j 

b  8 


mi 


n  ?{(Qmi +^)/n) 

8=i 


-mj/2 
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The  chance  that  the  LR  rule  wrongly  picks  gi,  equals  the  probability  that 

-"/2 

j  1  \  \  '  X  ''  X  O  I  t  i  i  til  1  I  ✓  •  /  \  I 

exp 


-iEEEE  +  £)/«}]{  II  0iCj/r)} 

a  b  q  '■j=l  ' 


nn(>  +  exp  f-2  ^  ^{,4(6/n,  0/nf  +  A(b/n, P/n)tabQ0}/gi  {(omj  +  0)/n)  1  \ 

o  O  X  b  0 


^  exp(-i  Y1 J2  ]C  51 

a  b  a  0 


nn( 


1  +  exp 


>])• 


-2EE(-4(fc/’>  /s} 

b  p 

(Here  we  have  used  symmetry  of  the  Normal  distribution,  which  implies  that  tabap  and 
laptabap  have  the  same  distribution.)  Equivalently,  since  G  =  1  —  pf it  equals  the  chance 
that 

exp^n^lGO/n)  +  log{i  -  G(j/n)}]  +  £  ~  +  ^)/n}) 

x  j=l  a  b  a  0  J 

x  nn{0+«p Yi{A(bfn'Ptn^ + A(b/n'P/n)c°<>ap}{ 1  -  g’((qtt»i + p)/n)}  ^ 


<  ^l  +  exp  J^{A(6/n,^/n)2  +  A(6/n,(5/n)eo6o^}]^  j  > 

Denote  the  left-hand  side  of  this  inequality  by  B  and  put 

di  =  y,  E  AWn'  ~  c2(",+1)  (/ )  = d  ’ 


b  p 
.-i 


A7oq  =  d,  3  ££>l(Vn,/W^a,s  =  AT(0,1) . 

4  0 

Noting  that  ~  ~  cn_1^*,1+1^  and  ^  ~  +  we  see  that 

log£  =  -{1  +  o(l)}|n  J^G(;/n)2  +  0P  [|n  G(;/n)2| 

>= i  L  ^  ;=i  J  J 

+  {l-fo„(l)}4^^exp{-2(d1+4A7at>)}[l+exp{-2(d1  +  d13.Nfl0)}]-1 

a  o 

x  (di  +  df  Nao)G(ami/n) 

=  -{1  +  op(l)}  J  rPl  +  {1  + 

=  n{^-v,(^+i)}/{(^+i)^}|8c2»',-2s  _  Ic4"1  J  rl'l  +  op{\)^  , 
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where  s  =  (f  tl>2)E([exp{2(d  +  d*.Y)}  +  l]-1(d  +  d$ N)),  N  =  N(0, 1),  and  c  is  chosen  so 
that  the  expectation  is  nonzero.  Choose  v(,2  to  be  either  nonnegative  or  nonpositive,  the 
sign  being  selected  so  that  s  >  0,  and  choose  (V'2 i  so  small  that  8c2,/l_2s  —  ^c4"1  f  >  0. 
Then  B  — *  +00  in  probability,  implying  that  the  chance  that  the  LR  rule  picks  g\  when 
go  is  the  true  variance  function  converges  to  one  as  n  — ♦  00.  This  completes  our  proof  in 
the  presence  of  condition  (3.S). 

The  proof  when  (3-9)  holds  is  simpler.  Adopt  the  same  notation  as  before,  except  that 
m  is  re-defined  as  mo  (~  n/mi),  v2  =  1,  and  m2  and  62  are  no  longer  needed.  Pursue  the 

We  next  sketch  a  proof  of  the  fact  that  if  (3.4)  holds  for  a  nonparametric  estimator 
zo  of  zq ,  then  an  — *  00.  We  treat  only  the  case  Vi  <  v2/{v2  +  1),  which  is  the  context  of 
the  major  part  of  our  proof  of  (3.3).  The  case  v\  >  t'2 / ( ^2  +  1)  is  similar.  Our  argument 
is  almost  identical  to  that  employed  to  derive  (3.3). 

Assume  that  estimation  takes  place  on  [ —  1 ,  l]2 .  Use  the  same  class  of  f's  but  change 
go,  g\  from  1,  (1  —  G)~ 1  respectively  to  H,  H  +  G  respectively,  where  G  is  as  before  and 
H  is  a  positive,  strictly  convex  function  with  unique  minimum  interior  to  [—1,1].  For 
definiteness  we  shall  take  H(z)  =  (1  +  z2)Bq,  where  our  selection  of  the  positive  contant 
Bo  depends  on  the  value  of  B.  Let  r0o  (=  0)  and  Z01  be  the  values  which  minimize  g0  and 
<7i,  respectively.  Now, 

g,1(z)  =  2B0z  +  621‘r'6;'i!>,2(z/62), 

which  equals  zero  when  z  =  z0j.  Thus,  by  appropriate  choice  of  t'>2  we  may  ensure  that  rc 
and  z01  are  distant  apart  an  amount  which  is  asymptotic  to  const  .Sj1'1 62l  The  argument 
given  during  our  proof  of  (3.3)  shows  that  it  is  impossible  to  discriminate  between  rP;. 
and  z01,  and  so  it  is  also  impossible  to  discriminate  between  z00  and  Zoi-  Therefore  no 
nonparametric  estimator  of  z0  can  converge  to  z0  more  rapidly  than  6^'  S^1,  and  the  latter 


same  argument. 
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is  asymptotic  to  a  constant  multiple  of 

+  _  may(rJ_2(1/s_] )/(2*/J  +  1)  n~2l/1  )/{(*'>  +1)|/j}  v 

the  above  identity  holding  since  iq  <  i/2/(£/2  +  1).  It  follows  that  if  (3.4)  holds  then 
an  — ►  oo. 

A  proof  of  the  fact  that  (3.5)  entails  a„  — ►  oo,  is  similar.  It  uses  the  same  go(—  H)  and 
gi(=  H  +  G )  as  above,  but  has  the  class  of  /’ s  changed  from  ?  to  T‘  =  {F  +  /  :  /  G  F). 
where  F  is  an  appropriate  bivariate  function  which  is  strictly  monotone  in  both  variables. 
For  example,  if  r0  =  2  and  z0  is  close  to  zero  then  w-e  may  take  F(x,  z )  =  (x  + 1)2  +  (z  4- l)2. 

4.  Random  design  case. 

Although  the  fixed  design  case  is  the  more  important,  analogues  of  our  results  may  be 
obtained  if  (xj,z,),  1  <  t  <  N,  are  random  variables  distributed  within  the  square  [0,  l]2 
according  to  density  d,  rather  than  points  on  a  lattice.  In  the  present  section  we  briefly 
discuss  the  random  design  case.  The  reader  is  referred  to  Pralcasa  Rao  (19S3,  Section  4.2) 
for  details  of  nonparametric  regression  estimation. 

Assume  that  N  observations  (xj,lj,zt-)  are  generated  by  the  model 

y,  =  /(xi,  Zi)  +  g(z,)l  f,  ,  1  <  i  <  N  , 

where  /  is  iq-smooth,  g  is  ^-smooth,  the  density  d  of  the  pairs  (x,,r,)  is  max(i/i,i/2)- 

smooth,  and  conditional  on  the  (x;,r,)'s  the  e.’s  are  'ndependent  with  zero  mean  and 

uniformly  bounded  second  moments.  A  kernel  estimator  of  d  is 

s 

(4.1)  d(X,z)  =  (.Vfc?)-1  Y,  "  T)/hj  ,  (Zj  -  z)/ht}  , 

2=1 

where  K\  is  a  compactly  supported  bivariate  function  as  in  Theorem  3.1  of  Stute  (19S4) 
and  such  that  /  x' z> I\i(x ,  z)  dx  dz  =  1  if  i  =  j  =  0,  0  if  1  <  i  +  j  <  (iq).  A  kernel 
estimator  of  /  is 

N 

(4.2)  /(x,r)  =  s(x,z)/d(x,z),  s(x,z)  =  YYiKii(ri  ~  *)/*!•(*>  “  *)/M  ■ 

2=1 


Let  d,(x,z)  and  i,(:r,r)  be  as  in  (4.1)  and  (4.2)  but  with  the  sums  taken  only  over 
j  ^  i,  and  let  f,(x,z)  =  s,(x,  z)/d,(x,  z),  r,  =  Y,  -  f(x,,z,)  (not  observable)  and  f,  = 
1',  —  f,(xi,Zi).  Fix  0  <  6  <  |.  Analogues  of  g  and  g  are 

N 

9  fa  =  r>7(^  <  xi  <  1  -  S)K2{(zj  -  z)/h2 } 


/  N 
T. 

'  3= i 


1(6  <  x j  <  l-6)K2{(z,  -  z)/h3 }  , 


9  fa  =  f2/(<5  <  Xj  <  1  -  6)I\2{(zj  -  z)/h2 } 


/  <  x>  <  1  _<5)A'2{(zj  ~  2)/M  , 


respectively,  where  K2  is  a  univariate  function  satisfying  J  zl K2(z)dz  =  1  for  i  —  0,  0  for 
1  <  i  <  fa). 

Take  /ij  =  Ar-1^2,,1+2^  and  write  a x  =  +  By  moment  calculations  applied 

to  s(r-*)  and  d for  0  <  r  +  s  <  V\,  using  (4.2)  we  find  that  for  6  <  x,z  <  1  —  6, 

(4.4)  {/<r*-)(x,  r)  -  /<r  *)(a-,x)}2  =  . 

By  Theorem  3.1  of  Stute  (19S4), 

sup{|d,(r,  z)  -  d(x,z)\  :  \  <i  <  N ,  6  <  x,z  <  \  —  6}  =  Ov{(a A-  log.V)^}  . 

Assuming  that  d  is  bounded  away  from  zero  on  [0,1]2,  one  can  show  that  uniformly  in 
6  <  x.  z  <  1  -  <5, 

(4.5)  =  (si-fd,)/d-(st-fd,)(d,-d)/d2  +  (s,-fd,)(d,-df/(d7dl)+Op(as  log  A')  . 


Let  h2  — ►  0  such  that  Nh2  — ♦  oo.  By  moment  calculations  applied  to  each  term  in 
g^,  it  follows  that  for  0  <  t  <  fa),  6  <  z  <  1  —  6, 


(4.6) 


-  !?'"(*))’  =  0,((Khlw  )-■  +  . 
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Using  (4.5),  detailed  calculations  yield 

(4.7)  {g('\z)  -  9(,\z)}2  =  0,{(.Vfc«+,)-i  +  A?"’"0  +  h~ . 

Equations  (4.4),  (4.6)  and  (4.7)  are  analogues  of  (2.6),  (2.8)  and  (2.11)  respectively. 

We  may  also  derive  analogues  of  (1.3),  (1.4)  and  (1.5),  by  following  essentially  the 
arguments  given  in  Section  1.  It  is  necessary  to  show  that 

sup  |p(2)(.z)  -  9i2){z)\  -»  0  ,  sup  \f(,'3\x,z)  -  f(,'})(x,z)\ -+ 0 

6<t<l-6  f<z,2<l-S 

in  probability,  where  (i,j)  =  (0,1)  or  (1,0).  The  trick  is  to  decompose  g ^  and  into 
a  series  of  terms  each  of  which  is  a  ratio  of  two  consistent  function  estimators.  Assuming 
sufficiently  many  moments  of  the  errors  and  Holder  continuity  of  derivatives  of  A'j 
and  K 2,  uniform  consistency  of  these  function  estimators  may  be  proved  by  using  the 
“continuity  argument”;  see  for  example  Stone  (1984,  foot  of  p.1292)  and  Hall  (19S5).  The 
technique  is  intricate  and  laborious,  but  conceptually  straightforward.  It  gives  the  same 
rates  of  convergence  exhibited  in  (1.3),  and  (1.4)  and  (1.5),  under  the  same  conditions  on 
/  and  g.  Arguments  similar  to  those  in  Section  3  may  be  employed  to  show  that  these 
rates  are  optimal. 
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